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ABSTRACT 

Zinc-finger recombinases (ZFRs) represent a poten- 
tially powerful class of tools for targeted genetic en- 
gineering. These chimeric enzymes are composed 
of an activated catalytic domain derived from the 
resolvase/invertase family of serine recombinases 
and a custom-designed zinc-finger DNA-binding 
domain. The use of ZFRs, however, has been re- 
stricted by sequence requirements imposed by the 
recombinase catalytic domain. Here, we combine 
substrate specificity analysis and directed evolution 
to develop a diverse collection of Gin recombinase 
catalytic domains capable of recognizing an 
estimated 3.77x10^ unique DNA sequences. We 
show that ZFRs assembled from these engineered 
catalytic domains recombine user-defined DNA 
targets with high specificity, and that designed 
ZFRs integrate DNA into targeted endogenous loci 
in human cells. This study demonstrates the feasi- 
bility of generating customized ZFRs and the poten- 
tial of ZFR technology for a diverse range of 
applications, including genome engineering, syn- 
thetic biology and gene therapy. 

INTRODUCTION 

Site-specific DNA recombination systems, such as 
Cre-loxP, FLP-FRT and (t)C31-att have emerged as 
powerful tools for genetic engineering (1,2). The 
enzymes that promote these conservative DNA rearrange- 
ments — known as site-specific recombinases — recognize 
short (30-40 bp) sequences and coordinate DNA 
cleavage, strand exchange and re-ligation by a mechanism 



that does not require DNA synthesis or a high-energy 
cofactor (3). This simplicity has allowed researchers to 
study gene function with extraordinary spatial and 
temporal sensitivity. However, the strict sequence require- 
ments imposed by site-specific recombinases have limited 
their application to cells and organisms that contain 
artificially introduced recombination sites or pre-existing 
pseudo-recognition sites. To address this hmitation, 
directed evolution has been used to alter the sequence 
specificity of several site-specific recombinases towards 
naturally occurring DNA sequences (4-8). Yet, despite 
advances (7,8), the widespread adoption of this technol- 
ogy has been hindered by the need for complex mutagen- 
esis and selection strategies (4,7) coupled with the finding 
that re-engineered recombinase variants routinely demon- 
strate relaxed substrate specificity (4,6-8). 

Zinc-finger recombinases (ZFRs) represent a versatile 
alternative to conventional site-specific recombination 
systems (9,10). These chimeric enzymes are composed of 
an activated catalytic domain derived from the resolvase/ 
invertase family of serine recombinases and a zinc-finger 
DNA-binding domain, which can be custom-designed to 
recognize almost any DNA sequence (11-16) (Figure lA). 
ZFRs catalyse recombination between specific ZFR target 
sites (17) that consist of two inverted zinc-finger-binding 
sites (ZFBS) flanking a central 20-bp core sequence 
recognized by the recombinase catalytic domain (18) 
(Figure IB). In contrast to zinc-finger (19-21) and tran- 
scription activator-like (TAL) effector nucleases (22,23), 
ZFRs function autonomously and can excise and integrate 
transgenes in human and mouse cells without activating 
the cellular DNA damage response pathway (9,24-26). 
However, as with conventional site-specific recombinases, 
applications of ZFRs have been restricted by sequence 
requirements imposed by the recombinase catalytic 
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Figure 1. Structure of the zinc-finger recombinase dimer bound to DNA. (A) Eacli ZFR monomer (blue or orange) consists of an activated serine 
recombinase catalytic domain linked to a custom-designed zinc-finger DNA-binding domain. Model was generated from crystal structures of the yb 
resolvase and Aart zinc-finger protein (PDB IDs: IGDT and 2113, respectively). (B) Cartoon of the ZFR dimer bound to DNA. ZFR target sites 
consist of two-inverted ZFBS flanking a central 20-bp core sequence recognized by the ZFR catalytic domain. ZFPs can be designed to recognize 
distinct 'left' or 'right' half-sites (blue and orange boxes, respectively). Abbreviations are as follows: N indicates A, T, C or G; R indicates G or A; 
and Y indicates C or T. 



domain, which dictate that ZFR target sites contain a 
20-bp core derived from a native serine resolvase/invertase 
recombination site. 

To address this problem, we previously described a 
knowledge-based approach for re-engineering serine re- 
combinase catalytic specificity (27). This strategy, which 
was based on the saturation mutagenesis of specificity- 
determining DNA-binding residues, was used to generate 
recombinase variants that showed > 10 000-fold shift in 
specificity. Significantly, this strategy focused exclusively 
on amino acid residues located outside the recombinase 
dimer interface (Supplementary Figure SI). As a result, we 
found that catalytic domains re-engineered by this method 
could associate to form ZFR heterodiniers, and that 
designed ZFR pairs could recombine pre-determined 
DNA sequences with exceptional specificity. Taken 
together, these results led us to hypothesize that an 
expanded catalogue of specialized catalytic domains 
developed by this method could be used for the design 
of ZFRs with custom specificity. Here, we expand on 
our previous work by combining substrate specificity 
analysis and directed evolution to develop a diverse 



collection of Gin recombinase catalytic domains capable 
of recognizing an estimated 3.77 x 10^ unique 20-bp core 
sequences. We show that ZFRs assembled from these 
re-engineered catalytic domains recombine user-defined 
sequences with high specificity, and that designed ZFRs 
integrate DNA into targeted endogenous loci in human 
cells. To our knowledge, this report describes the first 
generalized approach for the design of customizable 
site-specific recombinases and also provides the first dem- 
onstration of targeted integration into endogenous human 
loci by custom-designed site-specific recombinases. 

MATERIALS AND METHODS 

Plasmids 

The spht gene reassembly vector (pBLA) was derived 
from pBluescriptll SK (— ) (Stratagene) and modified to 
contain a chloramphenicol resistance gene and an inter- 
rupted TEM-1 p lactamase gene under the control of a lac 
promoter. ZFR target sites were introduced as previously 
described (8). Briefly, GFPuv (Clontech) was polymerase 
chain reaction (PGR) amplified with the primers 
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GFP ZFR Xbal Fwd and GPP ZFR Hindlll Rev and 
cloned into the Spel and Hindlll restriction sites of pBLA 
to generate pBLA-ZFR substrates. All primer sequences 
are provided in Supplementary Table SI. 

To generate luciferase reporter plasmids, the Simian 
vacuolating virus 40 (SV40) promoter was PGR amplified 
from pGL3-Prm (Promega) with the primers SV40-ZFR- 
Bgllll-Fwd and SV40-ZFR-Hindin-Rev. PGR products 
were digested with Bglll and Hindlll and Hgated into the 
same restriction sites of pGL3-Prm to generate pGL3- 
ZFR-1, 2, 3 ... 18. The pBPS-ZFR donor plasmids were 
constructed as previously described (24,27) with the fol- 
lowing exception: the ZFR-1, 2 and 3 recombination sites 
were encoded by primers 3' GMV (Gytoniegalovirus)- 
PstI-ZFR-1, 2 or 3-Rev. Gorrect construction of each 
plasmid was verified by sequence analysis. 

Recombination assays 

ZFRs were assembled by PGR as previously described 
(9,27). PGR products were digested with Sad and Xbal 
and hgated into the same restrictions sites of pBLA. 
Ligations were transformed by electroporation into 
Escherichia coli TOPIOF' (Invitrogen). After 1-h 
recovery in Super Optimal Broth with Gatabohte 
suppression (SOG) medium, cells were incubated with 5 
ml of Super broth (SB) medium with 30 |.ig ml~' of chlor- 
amphenicol and cultured at 37°G. At 16 h, cells were har- 
vested; plasmid DNA was isolated by Mini-prep 
(Invitrogen); and 200 ng of pBLA was used to transform 
E. coli TOPIOF'. After 1-h recovery in SOG, cells were 
plated on sohd Lysogeny broth (LB) media with 30 ng 
ml~' of chloramphenicol or 30|ig ml~' of chlorampheni- 
col and 100 ng ml~' of carbenicillin, an ampicillin 
analogue. Recombination was determined as the number 
of colonies on LB media containing carbenicillin and 
chloramphenicol divided by the number of colonies on 
LB media containing only chloramphenicol. Golony 
number was determined by automated counting using 
the GelDoc XR Imaging System (Bio-Rad). 

Selections 

The ZFR hbrary was constructed by overlap extension 
PGR as previously described (27). Mutations were 
introduced into the Gin catalytic domain at positions 
120, 123, 127, 136 and 137 with the degenerate codon 
NNK (N: A, T, G or G and K: G or T), which encodes 
all 20 amino acids. PGR products were digested with Sad 
and Xbal and ligated into the same restriction sites of 
pBLA. Ligations were ethanol precipitated and used to 
transform E. coli TOPIOF'. Library size was routinely 
determined to be ~5 x 10^. After 1-h recovery in SOG 
medium, cells were incubated in 100 ml of SB medium 
with 30 [ig ml"' of chloramphenicol at 37°G. At 16 h, 30 
ml of cells were harvested; plasmid DNA was isolated by 
Mini-prep; and 3 plasmid DNA was used to transform 
E. coli TOPIOF'. After 1-h recovery in SOG, cells were 
incubated with 100 ml of SB medium with 30|ig ml~' of 
chloramphenicol and 100 ng ml~' of carbenicilhn at 37°G. 
At 16h, cells were harvested, and plasmid DNA was 
isolated by Maxi-prep (Invitrogen). Enriched ZFRs were 



isolated by Sad and Xbal digestion and Hgated into fresh 
pBLA for further selection. After four rounds of selection, 
sequence analysis was performed on individual 
carbenicillin-resistant clones. Recombination assays were 
performed as described earlier in the text. 

ZFR construction 

Recombinase catalytic domains were PGR amplified from 
their respective pBLA selection vector with the primers 
5' Gin-HBS-Koz and 3' Gin-Agel-Rev. PGR products 
were digested with Hindlll and Agel and ligated into 
the same restriction sites of pBH (9) to generate the 
SuperZiF-compitable subcloning plasmids: pBH-Gin-a, 
(3, Y, 5, £ or ^. Zinc-fingers were assembled by SuperZiF 
(28) and hgated into the Agel and Spel restriction sites of 
pBH-Gin-a, p, y, 5, e or i; to generate pBH-ZFR-L/R-1, 2, 
3 ... 18 (L: left ZFR; R: right ZFR) (Supplementary Table 
S2). ZFR genes were released from pBH by Sfil digestion 
and ligated into pcDNA 3.1 (Invitrogen) to generate 
pcDNA-ZFR-L/R-1, 2, 3... 18. Gorrect construction 
of each ZFR was verified by sequence analysis 
(Supplementary Table S3). 

Luciferase assays 

Human embryonic kidney (HEK) 293 and 293 T cells 
(ATGG) were maintained in Dulbecco's modified Eagle's 
medium containing 10% (vol/vol) Fetal Bovine Serum 
(FBS) and 1% (vol/vol) Antibiotic- Antimycotic 
(Anti-Anti; Gibco). HEK293T cells were seeded onto 
96-well plates at a density of 4 x lO'' cells per well and 
established in a humidified 5% GO2 atmosphere at 37°G. 
At 24 h after seeding, cells were transfected with 1 50 ng of 
pcDNA-ZFR-L 1-18, 150ng of pcDNA-ZFR-R 1-18, 
2.5 ng of pGL3-ZFR-l, 2, 3... or 18 and 1 ng of pRL- 
GMV using Lipofectamine 2000 (Invitrogen) according to 
the manufacturer's instructions. At 48 h after transfection, 
cells were lysed with Passive Lysis Buffer (Promega), and 
luciferase expression was determined with the 
Dual-Luciferase Reporter Assay System (Promega) using 
a Veritas Microplate Luminometer (Turner Biosystems). 

Integration assays 

HEK293 cells were seeded onto 6-well plates at a density 
of 5x10^ cells per well and maintained in serum- 
containing media in a humidified 5% GO2 atmosphere 
at 37°G. At 24 h after seeding, cells were transfected with 
1 ng of pcDNA-ZFR-L-1, 2 or 3 and 1 ng of pcDNA- 
ZFR-R-1, 2 or 3 and 200 ng of pBPS-ZFR-1, 2 or 3 
using Lipofectamine 2000 according to the manufacturer's 
instructions. At 48 h after transfection, cells were split 
onto 6-well plates at a density of 5x10'' cells per well 
and maintained in serum-containing media with 2 ngmP' 
of puromycin. Gells were harvested on reaching 100% 
confluence, and genomic DNA was isolated with the 
Quick Extract DNA Extraction Solution (Epicentre). 
ZFR targets were PGR amplified with the following 
primer combinations: ZFR-Target-1, 2 or 3-Fwd and 
ZFR-Target-1, 2 or 3-Rev (Unmodified target); ZFR- 
Target-1, 2 or 3-Fwd and GMV-Mid-Prim-1 (Forward 
integration); and GMV-Mid-Prim-1 and ZFR-Target-1, 
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2 or 3-Rev (Reverse integration) using the Expand High 
Fidehty Taq System (Roche). For clonal analysis, at 
2 days post-transfection, 1 x 10^ cells were split onto a 
100-mm dish and maintained in serum-containing media 
with 2\.ig ml~' of puromycin. Individual colonies were 
isolated with 10- x 10-mm open-ended cloning cylinders 
with sterile silicone grease (Millipore) and expanded in 
culture. Cells were harvested on reaching 100% conflu- 
ence, and genomic DNA was isolated and used as 
template for PCR, as described earlier in the text. For 
colony counting assays, at 2 days post-transfection, ceUs 
were split into 6-well plates at a density of 1 x 10"^ cells per 
weU and maintained in serum-containing media with or 
without 2|ig ml~' of puromycin. At 16 days, cells were 
stained with a 0.2% crystal violet solution, and genome- 
wide integration rates were determined by counting the 
number of colonies formed in puromycin-containing 
media divided by the number of colonies formed in the 
absence of puromycin. Colony number was determined by 
automated counting using the GelDoc XR Imaging 
System (Bio-Rad). 

RESULTS 

Specificity profile of the Gin recombinase 

To effectively re-engineer serine recombinase catalytic spe- 
cificity, we first sought to develop a detailed understanding 
of the factors underlying substrate recognition by this 
family of enzymes. To accomphsh this, we evaluated the 
ability of an activated mutant of the catalytic domain of 
the DNA invertase Gin (29) to recombine an extensive set 
of symmetrically substituted target sites. In nature, the 
Gin catalytic domain recombines a pseudo-symmetric 
20-bp core that consists of two 10-bp half-site regions. 
Our collection of mutant recombination sites, therefore, 
contained each possible single-base substitution at pos- 
itions 10, 9, 8, 7, 6, 5 and 4 and each possible two-base 
combination at positions 3 and 2 and the dinucleotide 
core. We determined recombination by split gene reassem- 
bly (8), a previously described method that links recom- 
binase activity to antibiotic resistance. 

In general, we found that Gin tolerates: (i) 12 of the 
16 possible two-base combinations at the dinucleotide 
core (AA, AT, AC, AG, TA, TT, TC, TG, CA, CT, GA 
and GT); (ii) 4 of the 16 possible two-base combinations at 
positions 3 and 2 (CC, CG, GG and TG); (iii) a single A to 
T substitution within positions 6, 5, or 4; and (iv) aU 
16 possible single-base combinations at positions 10, 9, 
8, and 7 (Figure 2A-D). Furthermore, we found that 
Gin recombined a target site library containing >10'' (of 
a possible 4.29 x lO') unique base combinations at pos- 
itions 10, 9, 8 and 7 within each 20-bp target 
(Figure 2D). These findings are consistent with observa- 
tions made from crystal structures of the j8 resolvase 
(30,31), which indicate that (i) the interactions made by 
the recombinase dimer across the dinucleotide core are 
asymmetric and predominately non-specific; (ii) the inter- 
actions between an evolutionarily conserved Gly-Arg 
motif in the recombinase arm region and the DNA 
minor groove impose a requirement for adenine or 



thymine at positions 6, 5 and 4; and (iii) there are no 
sequence-specific interactions between the arm region 
and the minor groove at positions 10, 9, 8 or 7 
(Figure 2E). These results are also consistent with 
studies that focused on determining the DNA-binding 
properties of the closely related Hin recombinase (32-34). 

Re-engineering Gin recombinase catalytic specificity 

Based on the finding that Gin tolerates conservative sub- 
stitutions at positions 3 and 2 (i.e. CC, CG, GG and TG), 
we next investigated whether Gin catalytic specificity 
could be re-engineered to recognize core sequences con- 
taining each of the 12 base combinations not tolerated by 
the native enzyme (Figure 3A). To identify the specific 
amino acid residues involved in DNA recognition by 
Gin, we examined the crystal structures of two related 
serine recombinases, the y§ resolvase (30) and Sin recom- 
binase (35), in complex with their respective DNA targets. 
Based on these models, we identified five residues that 
contact DNA at positions 3 and 2: Leu 123, Thr 126, 
Arg 130, Val 139 and Phe 140 (numbered according to 
the y5 resolvase) (Figure 3B). We randomly mutagenized 
the equivalent residues in the Gin catalytic domain (He 
120, Thr 123, Leu 127, He 136 and Gly 137) by overlap 
extension PCR and constructed a library of ZFR mutants 
by fusing these catalytic domain variants to an unmodified 
copy of the 'HI' zinc-finger protein (ZFP) (9), which rec- 
ognizes the sequence 5'-GGAGGCGTG-3. The theoret- 
ical size of this library was 3.3 x 10^ variants. 

We cloned the ZFR hbrary into substrate plasmids con- 
taining one of five base combinations not tolerated by the 
native enzyme (GC, GT, CA, AC or TT) and enriched for 
active ZFRs by spht gene reassembly (8) (Figure 3C). 
After four rounds of selection, we found that the activity 
of each ZFR population increased > 1 000-fold on DNA 
targets containing GC, GT, CA and TT substitutions and 
> 100-fold on a DNA target containing AC substitutions 
(Figure 3D). We sequenced individual recombinase 
variants from each population and found that a high 
level of amino acid diversity was present at positions 
120, 123 and 127, and that >80% of selected clones con- 
tained Arg at position 136 and Trp or Phe at position 137 
(Supplementary Figure S2). These results suggest that pos- 
itions 120, 123 and 127 play critical roles in the specific 
recognition of unnatural core sequences, and that pos- 
itions 136 and 137 are important structural determinants 
for DNA-binding. We evaluated the ability of each 
selected enzyme to recombine its target DNA and found 
that nearly all recombinases showed high activity 
(>10% recombination) and displayed a > 1000-fold shift 
in specificity towards their intended core sequence 
(Supplementary Figure S3). As with the parental Gin, 
we found that several recombinases tolerated conservative 
substitutions at positions 3 and 2 (i.e. cross-reactivity 
against GT and CT or AC and AG), indicating that a 
single re-engineered catalytic domain could be used to 
target multiple core sites (Supplementary Figure S3). 

To further investigate recombinase specificity, we 
determined the recombination profiles of five Gin 
variants (hereafter designated Gin P, y, 5, e and Q 
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Figure 2. Specificity of the Gin recombinase catalytic domain. (A-D) Recombination was measured on DNA targets that contained (A) each 
possible two-base combination at the dinucleotide core, (B) each possible two-base combination at positions 3 and 2, (C) each possible single-base 
substitution at positions 6, 5 and 4 and (D) each possible single-base substitution at positions 10, 9, 8 and 7. Substituted bases are boxed above each 
panel. Recombination was evaluated by split gene reassembly and measured as the ratio of carbenicillin-resistant to chloramphenicol-resistant 
transformants ('Materials and Methods' section). Dotted lines indicate threshold for which sequences were considered non-functional. Error bars 
indicate standard deviation (;; = 3). (E) Interactions between the y5 resolvase dimer and DNA at (left) the dinucleotide core, (middle) positions 6, 5 
and 4 and (right) positions 10, 9, 8 and 7 (PDB ID: IGDT). Interacting residues are shown as magenta sticks. Bases are coloured as follows: A, 
yellow; T, blue; C, brown; and G, pink. 



shown to recognize 9 of the 12 possible two-base combin- 
ations at positions 3 and 2 not tolerated by the parental 
enzyme (GC, TC, GT, CT, GA, CA, AG, AC and TT) 
(Table 1). We found that Gin p, y and ^ recombined their 
intended core sequences with activity and specificity near 
that of the parental enzyme (hereafter referred to as Gin 
ot), and that Gin y, 8 and ^ were able to recombine their 
intended core sequences with specificity exceeding that 
of Gin a (Figure 3E). Each recombinase displayed a 
> 1000-fold preference for adenine or thymine at positions 
6, 5 and 4 and showed no base preference at positions 10, 
9, 8 and 7 (Supplementary Figure S4). These results 
indicate that mutagenesis of the DNA-binding arm 
allows for reprogramming of recombinase specificity at 
positions 3 and 2 without compromising recognition else- 
where. We were unable to select for Gin variants capable 
of tolerating AA, AT or TA substitutions at positions 



3 and 2. One possibility for this result is that DNA 
targets containing >4 consecutive A-T base pairs might 
exhibit bent DNA conformations that interfere with 
recombinase binding and/or catalysis. 

Engineering ZFRs to recombine user-defined sequences 

We next investigated whether ZFRs composed of the 
re-engineered catalytic domains could recombine pre- 
determined sequences. To test this possibility, we 
searched the human genome (GRCh37 primary reference 
assembly) for potential ZFR target sites using a 44-bp con- 
sensus recombination site predicted to occur approxi- 
mately once every 7.44x10* bp of random DNA 
(Figure 4A). This ZFR consensus target site, which was 
derived from the core sequence profiles of the selected 
Gin variants, includes '^3.77 x 10^ (of a possible 
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1.0955 X 10'^) unique 20-bp core combinations predicted 
to be tolerated by the 21 possible catalytic domain combin- 
ations and conservatively excludes low-affinity or unavail- 
able 5'-CNN-3' and 5'-TNN-3' triplets within each ZFBS. 
Using ZFP specificity as the primary determinant for selec- 
tion (36), we identified 18 possible ZFR target sites across 



Table 1. Catalytic domain substitutions and intended DNA targets 
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eight human chromosomes (Chromosome 1, 2, 4, 6, 7, 11, 
13 and X) at non-protein coding loci. On average, each 
20-bp core showed ~46% sequence identity to the core 
sequence recognized by the native Gin catalytic domain 
(Figure 4B). We constructed each corresponding ZFR 
by modular assembly (28) ('Materials and Methods' 
section). 

To determine whether each ZFR pair could recombine 
its intended DNA target, we developed a transient 
reporter assay that correlates ZFR-mediated recombin- 
ation to reduced luciferase expression (Figure 4A and 
Supplementary Figure S5). To accomphsh this, we 
introduced ZFR target sites upstream and downstream 
an SV40 promoter that drives expression of a luciferase 
reporter gene. HEK293T cells were co-transfected with 
expression vectors for each ZFR pair and the correspond- 
ing reporter plasmid. Luciferase expression was measured 
48 h after transfection. Of the 18 ZFR pairs analysed, 38% 
(7 of 18) reduced luciferase expression by >75-fold and 
22% (4 of 18) decreased luciferase expression by 
> 140-fold (Figure 4B). In comparison, GinC4, a positive 
ZFR control designed to target the core sequence 
recognized by the native Gin catalytic domain, reduced 
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Figure 3. Re-engineering Gin recombinase catalytic specificity. (A) The canonical 20-bp core recognized by the Gin catalytic domain. Positions 3 and 
2 are boxed. (B) (Top) Structure of the yS resolvase in complex with DNA (PDB ID: IGDT). Arm region residues selected for mutagenesis are 
shown as magenta sticks. (Bottom) Sequence alignment of the yS resolvase and Gin recombinase catalytic domains. Conserved residues are shaded 
orange. Black arrows indicate arm region positions selected for mutagenesis. (C) Schematic representation of the split gene reassembly selection 
system. Expression of active ZFR variants leads to restoration of the P-lactamase reading frame and host-cell resistance to ampicillin. Solid lines 
indicate the locations and identity of the ZFR target sites. Positions 3 and 2 are underlined. (D) Selection of Gin mutants that recombine core sites 
containing GC, GT, CA, TT and AC base combinations at positions 3 and 2. Asterisks indicate selection steps in which incubation time was 
decreased froin 16 h to 6h ('Materials and Methods' section). (E) Recombination specificity of the selected catalytic domains (p, y, 5, s and ^, 
wild-type Gin indicated by a) for each possible two-base combination at positions 3 and 2. Intended DNA targets are underlined. Recombination 
was determined by split gene reassembly and performed in triplicate. 
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Figure 4. ZFRs recombine user-defined sequences in mammalian cells. (A) Schematic representation of the luciferase reporter system used to 
evaluate ZFR activity in mammalian cells. ZFR target sites flank an SV40 promoter that drives luciferase expression. Solid lines denote the 
44-bp consensus target sequence used to identify potential ZFR target sites. The consensus ZFR target site consists of two-inverted 12-bp ZFBS 
flanking a central 20-bp core sequence recognized by the ZFR catalytic domain. Underlined bases indicate zinc-finger targets and positions 3 and 2. 
(B) Fold-reduction of luciferase expression in HEK293T cells co-transfected with designed ZFR pairs and their cognate reporter plasmid. 
Fold-reduction was normalized to transfection with empty vector and reporter plasmid. Renilla luciferase expression was used to normalize for 
transfection efficiency and cell number. The sequence identity and chromosomal location of each ZFR target site and the catalytic domain com- 
position of each ZFR pair are shown. Underlined bases indicate positions 3 and 2. Standard errors were calculated from three independent 
experiments. ZFR amino acid sequences are provided in Supplementary Table S3. (C) Specificity of ZFR pairs. Fold-reduction of luciferase 
expression was measured for ZFR pairs 1 through 9 and GinC4 for each non-cognate reporter plasmid. Recombination was normalized to the 
fold-reduction of each ZFR pair with its cognate reporter plasmid. Assays were performed in triplicate. 



luciferase expression by 107-fold. Overall, we found that 
50% (9 of 18) of the evaluated ZFR pairs decreased 
luciferase expression by >20-fold. The remaining ZFR 
pairs, however, had a neghgible affect on luciferase expres- 
sion. Importantly, virtually every catalytic domain that 
displayed signiticant activity in bacterial cells (>20% re- 
combination) was successfully used to recombine at least 
one naturally occurring sequence in mammalian cells. 

To evaluate ZFR speciticity, we separately co- 
transfected HEK293T cells with expression plasmids for 
the nine most active ZFRs with each non-cognate reporter 
plasmid. Every ZFR pair demonstrated high specificity 
for its intended DNA target, and 77% (7 of 9) of the 
evaluated ZFRs showed an overall recombination specifi- 
city nearly identical to that of the positive control, GinC4 
(Figure 4C). To estabhsh that reduced luciferase expres- 
sion was the product of the intended ZFR heterodimer 
and not the byproduct of recombination-competent ZFR 
homodimers, we measured the contribution of each ZFR 
monomer to recombination. Co-transfection of the ZFR 1 
'left' monomer with its corresponding reporter plasmid led 
to nearly a 130-fold reduction in luciferase expression 
(total contribution to recombination: ~22%), but the 
vast majority of individual ZFR monomers (16 of 18) 
did not significantly contribute to recombination (<10% 
recombination), and many (7 of 18) showed no activity 
(Supplementary Figure S6). Taken together, these 
studies indicate that ZFRs can be engineered to recombine 
user-defined sequences with high specificity. 



Engineered ZFRs target integration into the human genome 

We next evaluated whether ZFRs could integrate DNA 
into endogenous loci in human cells. To accomphsh this, 
we co-transfected HEK293 cells with ZFR expression 
vectors and a corresponding DNA donor plasmid that con- 
tained a specific ZFR target site and a puromycin- 
resistance gene under the control of an SV40 promoter 
(24) (Figure 5 A). For this analysis, we used ZFR pairs 1, 
2 and 3, which were designed to target non-protein coding 
loci on human chromosomes 4, X and 4, respectively 
(Figure 5A). At 2 days post-transfection, we incubated 
cells with puromycin-containing media and measured 
genome-wide integration rates by determining the 
number of puromycin-resistant (puro^) colonies. We 
found that (i) co-transfection of the donor plasmid and 
the corresponding ZFR pair led to a > 12-fold increase in 
puro"^ colonies in comparison with transfection with donor 
plasmid only, and (ii) co-transfection with both ZFRs led 
to a 6- to 9-fold increase in puro*^ colonies in comparison 
with transfection with individual ZFR monomers 
(Figure 5B). The overall integration rates for ZFR pairs 
1, 2 and 3 were determined to be 0.14 ± 0.06%o, 
0.24 ± 0.02% and 0.31 ± 0.1%, respectively. By compari- 
son, the genome-wide integration rate of our internal ZFR 
positive control, GinC4, towards a pre-introduced target 
site (24,25) was previously determined to be ~1%. To 
evaluate whether each ZFR pair correctly targeted integra- 
tion, we isolated genomic DNA from puro populations 
and amplified the targeted loci by PGR. The PGR products 
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Figure 5. ZFRs target integration into tlie human genome. (A) Schematic representation of the donor plasmid (top) and the genomic loci targeted 
by ZFRs 1, 2 and 3. Open boxes indicate neighbouring exons. Arrows indicate transcript direction. The sequence and location of each ZFR target is 
shown. Underlined bases indicate zinc-finger targets and positions 3 and 2. (B) Genome-wide ZFR-mediated integration rates. Data were normalized 
to data from cells transfected with donor plasmid only. Error bars indicate standard deviation (n = 3). (C) PCR analysis of ZFR-mediated inte- 
gration. PCR primer combinations amplified (top) unmodified locus or integrated plasmid in (middle) the forward or (bottom) the reverse orien- 
tation. (D) Representative chromatograms of PCR-amplified integrated donor for ZFRs 1 and 3. Arrows indicate sequencing primer orientation. 
Shaded boxes denote genomic target sequences. 



corresponding to integration in the forward and reverse 
orientation were observed at the loci targeted by ZFR 
pairs 1 and 2 (Figure 5C). ZFR pair 3 was found to 
target integration only in the reverse orientation. The 
reason for this bias remains unclear, but it could be 
explained by preferential formation of a particular 
synaptic complex topology (37). To determine the overall 
specificity of ZFR-mediated integration, we isolated 
genomic DNA from clonal cell populations and evaluated 
plasmid insertion by PCR. This analysis revealed targeting 
specificities of 14.2% (5 of 35 clones), 8.3% (1 of 12 clones) 
and 9.1% (1 of 1 1 clones) for ZFR pairs 1, 2 and 3, respect- 
ively (Supplementary Figure S7). Sequence analysis of 
each PCR product confirmed ZFR-mediated integration 
(Figure 5D); however, we observed mutations within the 
donor plasmid nearby the anticipated junctions for each 
ZFR pair. The mechanism underlying how these mutations 
were introduced remains unknown. Taken together, these 
results indicate that ZFRs can be designed to integrate 



DNA into endogenous loci. Finally, we note that the 
ZFR-1 'left' monomer was found to target integration 
into the ZFR-1 locus in the absence of the corresponding 
'right' ZFR monomer (Figure 5C). This result is consistent 
with the luciferase reporter studies described earlier in the 
text (Supplementary Figure S6) and indicates that 
recombination-competent ZFR homodimers have the 
capacity to mediate off-target integration. The comprehen- 
sive evaluation of off-target integration events and the 
development of optimized obligate heterodimeric ZFR 
architectures should lead to the design of ZFRs that 
show greater targeting efficiency and specificity. 

DISCUSSION 

Targeted genome engineering is driving progress in new 
areas of research in gene therapy, synthetic biology and 
basic science. Although improvements in the design and 
assembly of zinc-finger and TAL effector nucleases have 



Nucleic Acids Research, 2013, Vol. 41, No. 6 3945 



been central to this revolution, the development of new 
methods that do not rely on DNA double-strand breaks 
and thus, do not carry the risk of non-homologous end 
joining-mediated mutagenesis, are necessary to improve 
the safety of genome engineering. ZFRs capable of 
autonomously catalysing recombination between DNA 
targets represent one such alternative. Yet, despite their 
promise, the use of ZFRs has been limited by the strict 
sequence requirements imposed by the ZFR catalytic 
domain. In the present study, we have addressed this 
problem by combining substrate specificity analysis and 
directed evolution to establish a user-friendly toolbox of 
modified serine recombinase catalytic domains suitable 
for the design of ZFRs with custom specificity. Guided 
by an extensive evaluation of serine recombinase catalytic 
specificity, we have developed a collection of re-engineered 
Gin recombinase catalytic domains that recognize an 
estimated 3.77 x 10^ unique 20-bp core sequences. We 
have shown that ZFRs assembled from these re-engineered 
catalytic domains recombine user-defined sequences with 
high specificity and that designed ZFRs integrate DNA 
into pre-determined endogenous loci in human cells. 
Although previous studies have shown that site-specific re- 
combinases, such as the (|)C31 integrase, can mediate inte- 
gration into the human (38) and mouse genomes (39), these 
efforts were based on the presence of pseudo-recognition 
sites tolerated by the native enzyme (40), did not require 
catalytic reprogramming, and thus did not allow for tar- 
geting of user-defined sequences. To our knowledge, this 
report describes the first general approach for the design 
of site-specific reconibinases with customizable specificity 
and also provides the first demonstration of targeted 
integration into endogenous human loci by customized 
site-specific recombinases. 

Based on our current archive of >45 pre-selected zinc- 
finger modules, we estimate that ZFRs can now be designed 
to recognize between 5000 and 20 000 unique 44-bp DNA 
sequences in the human genome (Supplementary Note). 
This corresponds to approximately one potential ZFR 
target site for every 160 000-620 000 bp of random 
sequence and represents a substantial improvement in tar- 
geting capacity compared with conventional site-specific 
recombinases, which typicaUy require complex evolution- 
ary methods for reprogramming (4,7). Currently, the re- 
quirement for adenine by the Gin recombinase within 
positions 6, 5 and 4 represents the only major sequence 
restriction with the strategy described. To alleviate this con- 
straint, structurally and functionally related serine recom- 
binase variants (18) with broad or complementary 
sequence requirements at these positions could be subjected 
to the types of directed evolution described in this study. 
This approach may effectively expand the targeting reper- 
toire of this custom-designed site-specific recombinase 
family. Additional improvements in the targeting 
capacity of this technology could be envisioned with the 
incorporation of alternate DNA-binding domains; in par- 
ticular, we anticipate that the re-engineered catalytic 
domains described herein should be compatible with 
recently described TAL effector recombinases (41). 
Application of more sophisticated and high-throughput 
methods for specificity profiling (42) should lead to more 



effective use of the evolved catalytic domains and may also 
improve ZFR activity. Finally, although the efficiency of 
ZFR-mediated integration is lower than that achieved by 
zinc-finger (43,44) or TAL effector (22) nuclease-based 
approaches, we anticipate that optimization of the ZFR 
architecture will lead to reduced off-target integration 
events and higher targeting efficiency. Additional studies 
aimed at evaluating whether ZFR activity is ceU type (25) or 
chromatin structure dependent (45) may also help estabhsh 
limitations and clarify opportunities for ZFR targeting. 
In conclusion, we have developed a diverse collection of 
re-engineered Gin recombinase catalytic domains suitable 
for the design of ZFRs with custom specificity. We 
have shown that ZFRs can be assembled to recombine 
user-defined DNA targets, and that designed ZFRs inte- 
grate DNA into endogenous genomic loci. This work 
illustrates the potential of ZFRs for a wide range of apph- 
cations, including genome engineering, synthetic biology 
and gene therapy. 

SUPPLEMENTARY DATA 

Supplementary Data are available at NAR Online: 
Supplementary Tables 1-3, Supplementary Figures 1-7 
and Supplementary Note. 
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